Binarization-free Text Line Extraction for Historical Manuscripts
نویسنده
چکیده
Nowadays, large collections of old historical manuscripts, which contain valuable information about our cultural heritage, exist in libraries around the world. Recently, there has been much interest in their digitization for preservation reasons, since many of the available manuscripts’ quality has deteriorated from exposure to the environment. Digitization though is only the first step to make the information contained in manuscripts accessible to researchers and to the interested public. What we create after digitization is only a “digital image” of the page and further processing steps need to be applied during the handwriting recognition process, so that the manuscript’s content is transformed into a form that is interpretable by a computer. One important step in the handwriting recognition process is that of text line extraction, which aims at extracting individual text lines from the manuscript page. In this paper, we propose a binarization-free text line extraction method using seam carving [1]. The main idea is to compute an energy map of the input text blocks and determine minimum energy paths that pass through them. The energy map is constructed in a way so that gaps between text lines have low energy values. Therefore, a minimum energy path will pass only through these regions and will successfully separate two text lines. Our algorithm has the following two advantages:
منابع مشابه
Digital Restoration by Denoising and Binarization of Historical Manuscripts Images
This chapter deals with digital restoration, preservation, and data base storage of historical manuscripts images. It focuses on restoration techniques and binarization methods combined with image processing applied on document images for text background enhancement and discrimination. Sequential image processing procedures are applied for image refinement and enhancement on quality class categ...
متن کاملHybrid Binariztion Technique for Historical Manuscripts
This paper presents a new hybrid approach for the binarization and enhancement of Historical Manuscript. This paper deals with degradations which occur due to shadows, non-uniform illumination, low contrast and strain. We follow two distinct method of Binarization with a pre-processing procedure using a adaptive Wiener filter, a rough estimation of foreground regions and a background surface ca...
متن کاملInformation Extraction from Historical Semi-Structured Handwritten Documents
In this paper, we describe our approach to extract salient events such as birth and death records from historical French parish documents that contain free-form handwritten text. The challenges posed by these documents to the current state of the art in handwriting recognition and information extraction go well beyond the generic challenges in recognizing handwritten text such as style variatio...
متن کاملUsing Scale-Space Anisotropic Smoothing for Text Line Extraction in Historical Documents
This paper presents a novel approach for text line extraction which is based on Gaussian scale space, a dedicated binarization, and an energy minimization framework. It enhances the text lines in the image using multi-scale anisotropic second derivative of Gaussian filter bank at the average height of the text line. It then applies a binarization, which is based on component-tree and is tailore...
متن کاملA spatially adaptive statistical method for the binarization of historical manuscripts and degraded document images
In this paper, we present an adaptive method for the binarization of historical manuscripts and degraded document images. The proposed approach is based on maximum likelihood (ML) classification and uses a priori information and the spatial relationship on the image domain. In contrast with conventional methods that use a decision based on thresholding, the proposed method performs a soft decis...
متن کامل